The objective of this module is to test for significant SNPs using a
joint approach. For this, we will use the penalizedLMM R
package, which is available on GitHub. Once
again, we will begin by loading the necessary libraries:
library(data.table)
library(magrittr)
library(qqman)
library(snpStats)
library(dplyr)
# devtools::install_github("areisett/penalizedLMM")
library(penalizedLMM)
Next, we load the data. Start by loading the clinical data, as this has the outcome (coronary artery disease (‘CAD’)) we need for our models.
clinical <- fread("data/GWAStutorial_clinical.csv")
# str(clinical) # if you need to remind yourself of what is here
We also need to load the genetic data. For this section, we will work
with the quality controlled (QC’d) data from the SNPRelate
package (see module 1). We will also need the “.bim” file from the
original data for making plots. Finally, we need our design matrix \(X\) with no missing values (this is the
\(X\) we obtained by our imputation
procedures).
# load QC'd data:
qc_dat <- readRDS('data/gwas-qc.rds')
# load the bim file
bim <- fread('data/GWAStutorial.bim')
# load design matrix of SNP data (you would need to complete the imputation module first)
X <- readRDS(file = "data/fully_imputed_numeric.rds")
If you completed the population structure module, you should load the principal components as well - we need them for our analysis. If you did not work through the population structure module, you can skip this step.
# load principal components
PCs <- readRDS(file = "data/PCs_base.rds") %>%
as.data.frame()
names(PCs) <- paste0("PC", 1:ncol(PCs))
# the CAD outcome is binary (0/1), so it may not be the best place to begin for the
# joint model example
joint_model <- plmm(X = X,
y = clinical$CAD,
intercept = FALSE)